Skip to content

Fixes Issue #328 #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 1, 2023
Merged

Fixes Issue #328 #329

merged 4 commits into from
May 1, 2023

Conversation

RupertAvery
Copy link
Contributor

@RupertAvery RupertAvery commented Apr 29, 2023

Fixes #328

Parse the keyword and text directory using utf8 encoding if PNG Chunk Type is iTXt

Parse the keyword and text directory using utf8 encoding if PNG Chunk Type is iTXt
Copy link
Owner

@drewnoakes drewnoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great. I'll try it on the regression test data set before merging.

@@ -248,7 +249,7 @@ private static IEnumerable<Directory> ProcessChunk(PngChunk chunk)
else if (chunkType == PngChunkType.iTXt)
{
var reader = new SequentialByteArrayReader(bytes);
var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_latin1Encoding);
var keyword = reader.GetNullTerminatedStringValue(maxLengthBytes: 79).ToString(_utf8Encoding);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually now an issue slightly below here. The bytesLeft value was based on the length of the string in bytes, which for latin1 is the same as the length of the string in characters. With UTF-8 that's not the case. I'll patch this up and push to your PR.

@drewnoakes
Copy link
Owner

drewnoakes commented May 1, 2023

I ran this against the regression test data set and it now successfully parses a bunch of previously broken values. I see no downside to this anywhere. Great stuff, thanks!

drewnoakes added a commit to drewnoakes/metadata-extractor that referenced this pull request May 1, 2023
This is a port of a fix from the .NET library in drewnoakes/metadata-extractor-dotnet#329

PNG chunks of type `iTXt` should have keywords and values decoded using UTF-8, not Latin1 encoding.
drewnoakes added a commit to drewnoakes/metadata-extractor-images that referenced this pull request May 1, 2023
@drewnoakes drewnoakes merged commit 7da96dd into drewnoakes:master May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading UTF-8 from iTXt PNGChunk
2 participants